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'or  information  valence  of  the  cue.'  Subjects  integrated  information  in  problems 
of  either  6,  8,  or  10  cues,  presented  at  a  slow  or  fast  speed.  In  either  a 
verbal  (numerical)  or  spatial  (graphical)  format.  After  each  problem,  subjects 
made  a  choice  of  the  most  likely  hypothesis  accompanied  by  an  analog  judgment 
of  their  confidence  in  that  choice^, 

V The  results  were  examined  from  two  perspective:  (1)  From  the  perspective 
of  human  engineering  guidelines, the  data  indicated  that  subjects'  decisions 
were  more  accurate  using  the  spatial  display.  This  finding  supports  the 
principle  of  S-C  compatibility  stating  that  the  analog  operations  on  which 
the  judgments  were  based  would  be  best  servecy  by  spatial  displays.  The 
spatial  advantage  was  enhanced  when  the  cue^/were  delivered  at  a  slower  speed, 
Imposing  greater  demands  on  working  memory. sX2)  The  data  for  both  displays 
were  analyzed  from  the  perspective  of  different  models  of  probablistic 

Information  integration* _ With  both  display  configurations.,  subjects  tended 

to  apply  an  absolute,  rather  than  a  relative  judgment  of  cue  reliability. 

They  did  not  appear  to  be  influenced  by  either  recency  or  primacy  (anchoring), 
but  appeared  to  down  weight  differences  in  the  reliability  of  information 
sources,  relative  to  thej  optimal. 


«  - 

\  i 


Unclassified 


Scott  &  Wlckens 


A  Comparison  of  Verbal  and  Graphical  Information  Presentation 

In  a  Complex  Information  Integration  Decision  Task 

Christopher  D.  Wickens  and  Brad  D.  Scott 
Abstract 


This  report  describes  an  experiment  conducted  to  evaluate  the 
relative  merits  of  verbal  as  opposed  to  spatial -graphical  display ; formats 
In  presenting  sequential  Information  to  subjects  In  a  simulated  C3 
tactical  decision  making  task.  The  task  required  subjects  to  integrate  a 
series  of  Information  messages  bearing  on  the  likelihood  that  one  of  two 
hypotheses  pertaining  to  tactical  battlefield  maneuvers  was  In  effect. 

Each  Information  source  could  vary  in  its  diagnosticity  and  its 
reliability.  These  variables  contribute  Independently  to  the  total 
valence  or  information  value  of  the  cue.  Subjects  integrated  information 
in  problems  of  either  6,  8,  or  10  cues,  presented  at  a  slow  or  fast  speed, 
in  either  a  verbal  (numerical)  or  spatial  (graphical)  format.  After  each 
problem  subjects  made  a  choice  of  the  most  likely  hypothesis  accompanied 
by  an  analog  judgment  of  their  confidence  In  that  choice. 


The  results  were  examined  from  two  perspectives:  (1)  From  the 
perspective  of  human  engineering  guidelines  the  data  Indicated  that 
subjects'  decisions  were  more  accurate  using  the  spatial  display.  This 
finding  supports  the  principle  of  S-C  compatibility  stating  that  the 
analog  operations  on  which  the  judgments  were  based  would  be  best  served 
by  spatial  displays.  The  spatial  advantage  was  enhanced  when  the  cues 
were  delivered  at  a  slower  speed.  Imposing  greater  demands  on  working 
memory.  (2)  The  data  were  analyzed  from  the  perspective  of  different 
models  of  probablistlc  information  integration.  In  two  respects  subjects 
tended  to  optimal  behavior:  they  tended  to  apply  an  absolute,  rather  than 
a  relative  judgment  of  cue  reliability,  and  they  did  not  appear  to  be 
influenced  by  either  recency  or  primacy  (anchoring).  However,  they  did 
appear  to  down  weight  differences  in  the  reliability  of  information 
sources,  relative  to  the  optimal. 
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Introduction 


Tactical  decisions  are  made  by  military  commanders  operating  within 
command,  control,  and  communications  (C3)  systems.  C3  is  defined  as  a 
closed-loop  man-machine  system  designed  to  facilitate  decision  making 
authority  and  direction  by  a  properly  designated  personnel.  Currently,  C3 
systems  have  a  "real  time"  characteristic  where  commanders  operate  under 
conditions  of  time  stress  and  uncertainty.  The  amount  of  available 
information  will  vary  greatly  and  typically  has  a' short  life  cycle. 

Samet,  Weltman,  and  Davis  (1976)  assert  that  computer-based  military 
systems  for  C3  operations  have  Increased  the  rate  and  density  of 
Information  flow  to  such  an  extent  as  to  overwhelm  the  commander.  They 
assert  that  more  information  may  In  fact  degrade,  rather  than  enhance  the 
absolute  level  of  decision  making  performance.  Loo  (1981)  characterizes 
the  current  state  of  affairs  as  "crisis  management."  The  decision  making 
performance  of  the  military  commander  is  the  essence  of  the  C3  system. 
Consideration  of  human  performance  limits  in  the  iterative  process  of  C3 
system  design  is  critical  to  system  performance.  Several  approaches  are 
being  taken  to  Improve  decision  making  performance  within  the  C3  system. 

Of  these,  probably  the  most  successful  involves  the  development  of 
decision  aids. 

Decision  Aids 

Decision  aids  can  be  implemented  at  different  levels  within  the  C3 
system  and  have  inherently  different  functions.  For  example,  linear 
models  can  be  used  to  augment  or  replace  the  decision  maker  (Dawes,  1979). 
Proper  linear  models  can  be  used  in  a  normative  sense  to  integrate 
information  for  the  decision  maker.  Paramorphic  models  are  created  by 
modeling  the  decision  maker  and  are  considered  an  improper  linear  model  in 
the  sense  that  the  derived  weights  are  non-optimal .  Bootstrapping 
involves  replacing  the  decision  maker  with  a  paramorphic  model.  It  has 
been  demonstrated  in  clinical  studies  that  all  three  models  can 
out-perform  the  decision  maker  using  various  criterion  variables  (Dawes  & 
Corrigan,  1974;  Dawes,  1979;  Goldberg,  1970).  Dawes  (1979)  addresses 
questions  raised  about  the  technical,  psychological,  and  ethical  problems 
associated  with  the  implementation  of  these  linear  models.  Several  Issues 
raised  here  remain  particularly  relevant  to  the  application  of  linear 
models  in  tactical  C 3  systems. 


Samet,  Weltman,  and  Davis  (1976)  take  yet  another  approach  and  make  a 
solid  case  for  an  adaptive  computerized  system  to  control  information  flow 
so  as  to  best  match  overall  system  and  human  capabilities.  They  assent 
that  the  multi -attribute  information  utility  model  is  superior  to  the 
decision  maker, at  selecting  the  optimal  amount  and  type  of  information  in 
Inferential  decision  making  tasks.  They  propose  that  this  approach  will 
increase  the  efficiency  and  effectiveness  of  the  decision  maker. 
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The  approach  of  the  present  study  Is  somewhat  different  from  those 
discussed  above.  We  are  concerned  with  the  design  of  Information  displays 
that  will  enhance  decision  making  performance  In  probabllstlc  Information 
Integration  tasks.  The  general  task  setting  we  consider  Is  one  In  which 
the  operator  must  Integrate  a  series  of  sequentially  presented  Information 
cues  that  vary  In  their  diagnostic  value  with  regard  to  a  set  of 
hypotheses.  The  operator  uses  this  Information,  optimally  to  revise  In 
working  memory  a  continuous  analog  "scale11  of  confidence  that  the  evidence 
favors  one  or  the  other  hypothesis.  Such  a  task  Imposes  limitations  of  at 
least  two  general  classes:  On  the  one  hand,  several  Investigators  have 
documented  a  wide  variety  of  cognitive  limitations  related  to  such 
heuristics  as  anchoring,  representativeness,  and  the  non-optima 1  weighting 
of  sequential  cues  (e.g.,  Kahneman  &  Tversky,  1973;  Einhorn  &  Hogarth, 
1981;  Slovic,  Flschhoff,  &  Lichtenstein,  1977;  Lopes,  1982).  On  the  other 
hand,  some  limitations  may  be  perceptual  In  nature,  describing  bottlenecks 
between  the  way  the  Information  Is  physically  formatted  and  the  optimal 
analog  model  that  Is  to  be  maintained  in  working  memory.  The  following 
pages  will  discuss  In  turn  research  related  to  Information  integration  and 
to  stimulus  processing. 


Information  Integration 

An  abundance  of  empirical  evidence  has  demonstrated  that  human 
judgment  often  does  not  reflect  an  optimal  normative  model.  Tversky  and 
Kahneman  (1973),  Lyon  and  Slovic  (1976)  and  others  have  shown  that  judges 
neglect  base  rate  Information.  Tversky  and  Kahneman  have  demonstrated 
that  cognitive  heuristics  such  as  "representativeness,"  "anchoring  and 
adjustment,"  and  "availability"  result  in  systematic  biases  in  judgment. 
Wickens  (1983)  has  summarized  a  series  of  examples  In  which  subjects  tend 
to  Ignore  differences  in  the  reliability  of  information  sources,  when 
these  are  Integrated  in  multi-element  decision  tasks.  These  phenomena 
have  sometimes  been  faulted  for  being  highly  problem-specific  and  of 
Insufficient  magnitude  to  yield  a  priori  empirical  predictions  (Bar 
Hi llel,  1980;  Wallsten,  1977,  1980).  Nonetheless,  there  have  been  a 
sufficient  number  of  studies  demonstrating  that  systematic  biases  do  exist 
In  human  judgment,  resulting  In  deviations  from  normative  models  to 
suggest  that  these  could  represent  an  Important  source  of  difficulty  In 
the  C3  environment. 


Lopes  (1982)  has  contrasted  three  models  of  information  integration 
used  by  subjects  in  different  experimental  settings.  Each  describes  the 
manner  in  which  an  internal  subjective  response,  r.  Is  attained  and 
updated  from  the  value  of  two  or  more  cues  or  sources  of  Information. 
These  are  the  multiplying  model,  the  averaging  model,  and  the  adding  or 
relative  weighting  model. 
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Multiplying  models,  where  r^j  =  SA-j  Sgj,  have  been  widely  studied. 

If  an  integration  process  adheres Jto  this  model,  then  the  effect  of  one  S 
variable  is  magnified  by  increasing  levels  of  the  other.  In  many  of  its 
applications  to  decision  theory,  the  two  stimulus  or  cue  values  represent 
a  weight,  usually  associated  with  a  probabilistic  value,  and  a  scale  value 
associated  with  the  content  of  the  information.  Thus  changes  in  the  value 
of  an  Information  source  are  assumed  to  have  progressively  more  Impact  on 
R,  as  the  weight  of  that  source  is  Increased. 


Averaging  models  assume  that  the  information  from  two  or  more  cues 
will  simply  be  averaged  along  some  Internal  continuum,  with  each  cue 
"pulling"  the  average  toward  its  particular  value.  This  latter 
characteristic  is  Important.  If  a  series  of  cues  are  being  Integrated 
over  time,  the  averaging  model  says  that  each  new  cue  will  always  adjust 
th>»  running  average  toward  the  value  of  that  cue,  i.e.,  between  the 
current  average  and  the  new  cue  value.  While  an  averaging  model  is  often 
appropriate  in  some  circumstances  (e.g.,  as  the  subject  is  talleying  the 
average  value  of  a  series  of  numbers),  Lopes  emphasizes  that,  when 
integrating  probablistic  information  the  technique  is  clearly 
inappropriate.  Consider  for  example  a  Bayesian  inference  task  in  which 
belief  in  one  hypothesis  is  held  to  a  relatively  strong  extent,  and  new 
evidence  which  only  weakly  supports  the  same  hypothesis  is  delivered.  The 
averaging  model  predicts  that  the  new  belief  will  be  adjusted  downward 
toward  the  neutral  point,  reflecting  a  weighted  average  of  the  old  belief 
and  the  new  evidence.  Yet  this  is  non-optimal .  Any  evidence,  no  matter 
how  weak,  in  favor  of  that  hypothesis  should  cause  a  shift  toward  greater 
belief.  Lopes,  however,  reports  that  many  people  demonstrate  this  same 
non-optlmal  bias,  applying  averaging  when  a  Bayesian  inference  is 
warranted. 


The  third  class  of  models  then,  of  which  the  Bayesian  inference  is  a 
specific  case,  are  the  relative  ratio  models.  These  are  optimal  for 
Bayesian  tasks  in  which  there  are  two  polar  alternatives.  Each  sampled 
cue  will  move  the  current  average  in  the  direction  according  to  the 
hypothesis  supported,  with  a  magnitude  given  by  the  diagnostic  value  of 
the  stimulus.  Only  stimuli  with  no  information  value  at  all  will  fail  to 
move  the  current  pointer.  Unlike  the  averaging  model,  the  current  pointer 
will  only  be  moved  toward  a  new  piece  of  sampled  evidence  if  the  evidence 
is  more  extreme  than  the  value  of  the  current  pointer.  In  essence  then, 
new  information  Is  added  to  the  Integration  of  previous  information, 
rather  than  being  averaged  with  that  information. 


It  is  clear  from  a  review  of  the  literature  on  information 
integration  that  the  model  of  prediction  Is  largely  dependent  upon  the 
problem  situation  and  experiment  1  design.  Lopes  (1982)  asserts  that  the 
difference  in  data  that  exhibit  relative  ratio  procedure  and  data  that 
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exhibit  an  averaging  procedure  can  be  explained  in  terms  of  the  subjects' 
representation  of  the  response  scale  and  the  order  in  which  stimulus 
features  are  to  be  processed.  In  this  respect ,  Lopes  was  able  to  make 
subjects  more  normative  Bayesian  Information  integrators  by  changing  their 
adjustment  strategies. 


Each  model  has  associated  with  it  a  series  of  weights,  attached  to 
each  source  of  information  that  dictates  how  much  each  new  item  of 
Information  will  impact  on  the  running  estimate.  Therefore  another 
characteristic  that  underlies  models  of  information  integration  concerns 
the  weights  that  are  applied  to  different  sources  of  evidence.  Three 
characteristics  of  this  weighting  scheme  are  relevant  to  the  present 
research:  (1)  How  weights  are  assigned  according  to  the  order  of  cue 
arrival  (the  issue  of  primacy  and  recency  or  serial  position),  (2)  How 
weights  are  assigned  according  to  thr.  nature  of  the  information,  (3) 
Whether  absolute  or  relative  weight:  are  employed. 


(1)  Serial  position  effects:  Lopes  (1982)  has  demonstrated  the 
manner  in  which  the  weight's  assigned  to  stimuli  or  information  sources 
arriving  in  sequence  dictate  various  serial  position  effects.  In 
particular,  heavy  weightings  assigned  to  the  first  arriving  cues  indicate 
anchoring  or  primacy:  A  relunctance  to  adjust  current  estimates  in  light 
of  newly  arriving  information.  Heavy  weights  assigned  to  the  final  cues 
suggest  recency  In  the  integration  process.  The  potential  role  of  these 
two  biases  will  be  examined  in  the  present  experiment. 


(2)  Differential  weightings  to  different  cues.  Following  the 
multiplicative  model  of  Information  integration,  the  reliability  of  a 
piece  of  Information  should  optimally  be  given  the  same  weight  as  the 
dlagnostlcity  of  that  Information  in  choosing  between  hypotheses  (i.e., 
the  extent  to  which  the  value  of  the  cue  is  more  likely  under  one 
hypothesis  than  the  other)  (Johnson,  Cavanagh,  Spooner,  &  Samet,  1973). 
These  two  should  multiply  to  derive  the  total  information  "worth."  Yet 
there  is  evidence  chat  when  several  sources  of  information  differing  in 
both  dlagnostlcity  and  reliability  must  be  integrated,  subjects  tend  to 
discount  differences  in  reliability  of  Information  sources,  treating  all 
sources  as  If  they  were  fully  reliable,  and  focussing  attention  Instead 
more  exclusively  on  dlagnostlcity  (e.g.,  Kanarick,  Huntington,  &  ’’eterson, 
1969;  Kahneman  &  Tversky,  1973;  Schum,  1975).  We  shall  be  investigating 
this  "as  if"  heuristic. 


(3)  Absolute  versus  relative  weighting.  The  issue  here  concerns  the 
extent  to  which  people  'weigh  the  strength  of  evidence  for  or  against  a 
particular  hypothesis  relative  to  the  total  amount  of  evidence  presented. 
Will  the  subject  for  example,  who  is  confronted  with  a  piece  of  evidence 
favoring  a  hypothesis  by  70/30  odds  be  more  Inclined  to  believe  that 
hypothesis  true  if  this  was  the  only  evidence  viewed,  than  if  he  had 
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previously  viewed  10  pieces  of  evidence  whose  net  effect  was  neutral 
(l.e.,  favored  neither  hypothesis).  If  the  answer  Is  affirmative,  then 
the  subject  Is  employing  some  form  of  relative  weighting  scheme.  The 
second  case  is  seen  by  the  subject  to  provide  weaker  evidence  because  the 
70/30  odds  were  only  obtained  after  11  cues.  In  the  first  case,  the  same 
odds  were  attained  with  only  a  <  ingle  case.  On  the  other  hand,  if  the 
subject  views  the  two  situations  as  providing  equivalent,  evidence  then  he 
Is  following  an  absolute  weighting  scheme.  Applying  optimal  Bayesian 
procedures,  the  posterior  odds  equal  the  prior  odds  multiplied  by  the 
likelihood  ratio,  and  the  question  of  how  many  cues  were  required  to 
attain  those  prior  odds  Is  Immaterial.  This  then  Is  another  Issue  we 
shall  address. 


The  problem  situation  and  experimental  design  of  an  Investigation  by 
Fleming  (1970)  Is  particularly  relevant  to  our  study.  In  his  experiment 
subjects  processed  conflicting  information  In  a  tactical  decision  making 
task.  Decision  making  performance  was  compared  to  a  Bayslan  Inference 
model  whereby  probabilities  from  successive  Information  sources  should, 
normatlvely,  be  multiplied  to  arrive  at  an  overall  probability  for  each  of 
three  alternative  hypotheses.  Fleming  concludes  that  rather  than  obeying 
a  normative  model  the  majority  of  subjects  used  an  adding  model. 
Furthermore,  all  subjects  that  did  not  receive  feedback  were  reported  to 
have  used  an  adding  model.  It  Is  not  clear  what  method  of  analysis 
Fleming  used  in  concluding  that  his  data  exhibited  an  adding  model  of 
Information  Integration. 


A  descriptive  adding-multiplying  model  has  been  utilized  to  represent 
performance  In  the  current  study.  The  descriptive  model  of  information 
integration  is  Illustrated  In  Figure  1.  The  specific  stimulus  dimensions, 
reliability  and  diagnostlcity,  will  be  defined  at  a  later  point.  Models  A 
and  B  both  present  the  same  Information,  however.  Model  B  requires  only 
half  the  processing  of  Model  A,  l.e.,  only  the  differences  between  the 
Information  values  supporting  Hypothesis  A  and  Sir  supporting 
Hypothesis  B  are  given  In  Model  B.  Model  B  Is  more  efficient  and 
represents  the  actual  form  of  the  adding-multiplying  model  used  In  the 
current  study.  These  models  are  similar  to  the  adding  model  found  In 
Fleming's  (1970)  study  In  that  Information  Integration  across  cues  Is  att 
adding  process.  The  optimum  values  of  Individual  cues  were  given  In 
Fleming's  study  while  they  are  determined  via  a  multiplicative  process  In 
our  model.  It  Is  Important  to  note  that  the  model  Is  not  complete  It  that 
It  does  not  describe  a  difference  judgment.  That  Is,  at  the  end  of  t..«. 
trial  the  subject  has  a  trial  value  for  each  competing  hypothesis. 
Hypothesis  A  and  Hypothesis  B,  and  must  now  differentiate  between  the  two 
and  decide  which  Is  the  most  likely  hypothesis.  This  process  will  be 
discussed  below  In  the  experimental  predictions. 


*i>Vm  ^ 
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Stimulus  Structure 

In  the  present  experiment  our  subjects  are  required  to  Integrate  a 
series  of  cues  each  of  which  differed  in  reliability  and  dlagnostlclty. 

As  noted  above,  these  two  values  should  optimally  be  multiplied  to  produce 
a  new  quantity,  the  total  Information  value,  worth,  or  valence  of  the  cue, 
and  It  Is  these  valences  we  ask  subjects  to  derive  and  integrate  across 
cues.  In  the  "verbal"  display  condition,  the  reliability  and 
dlagnostlclty  values  are  presented  numerically.  In  the  spatial  display, 
they  are  presented  graphically  as  the  height  and  base  of  a  rectangle, 
respectively.  The  spatial  display  has  two  potential  advantages  over  the 
verbal . 

(1)  According  to  the  principle  of 

stimulus/central -processing/response  (S-C-R)  compatibility,  outlined  by 
Wickens,  Sandry,  and  Vldulich  (1983),  tasks  that  demand  spatial /analog 
processes  In  working  memory  will  be  best  served  by  visual  spatial  displays 
and  more  poorly  served  by  verbal  displays  (either  speech  or  print).  Since 
the  present  task  requires  that  the  subject  update  a  continuous  scale  of 
confidence  In  working  memory  with  each  added  cue,  the  S-C-R  compatibility 
theory  predicts  better  performance  with  the  graphical  display. 


(2)  An  added  benefit  of  the  format  of  the  graphical  display  Is  that 
the  height  and  width  (dlagnostlclty  and  reliability)  of  each  rectangle  cue 
are  combined  In  such  a  way  as  to  produce  a  new  di men si on --area- -that  is 
directly  equal  to  the  cue  valence  measure  subjects  are  supposed  to 
Integrate.  A  series  of  Investigations  indicate  that  these  two  dimensions 
are  "Integral"  and  so  are  combined  "automatically"  and  holistically  by  the 
perceptual  system  to  generate  a  direct  perception  of  rectangular  area 
(Garner,  1974;  Gamer  &  Felfoldy,  1970;  Lockhead,  1979).  Hence,  with  the 
spatial  display,  the  subject  does  not  need  to  engage  in  the  conscious, 
cognitively  loading  multiplication  process  to  combine  the  two  dimensions 
and  derive  the  valence  measure. 


There  Is  however  one  potential  drawback  to  the  use  of  the  holistic 
analog  display  that  may  lead  to  systematic  biases.  9m1th  (1969)  has 
argued  that  the  perception  of  rectangle  size  Is  Influenced  not  only  by 
area,  but  by  perimeter  as  well.  This  conclusion  accounts  for  Anderson  and 
Weis's  (1971)  observation  that  the  size  of  highly  eccentric  rectangles  Is 
consistently  overestimated.  Hence,  elongated  rectangles  will  be  judged  as 
larger  than  squares  of  the  same  area,  because  the  perimeter  of  the  former 
Is  greater.  If  this  bias  operates,  then  subjects  will  tend  to 
overestimate  the  valence  of  cues  in  which  reliability  and  diagnostlcity 
are  negatively  correlated  (producing  elongation),  relative  to  the 
square-producing  cases  in  which  the  two  variables  covary  In  a  positive 
fashion. 
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Experimental  Predictions 


Experiment  J. 

A  series  of  decision  problems  Mere  designed  In  the  context  of  a 
tactical  battlefield  scenario.  The  subject  was  designated  as  a  commander, 
responsible  for  defending  an  area  through  which  an  attack  from  a 
ficticious  threat  force  was  eminent.  Threat  force  doctrine  as  well  as 
terrain  features  In  the  area  of  operations  dictated  that  the  attack  would 
come  along  a  narrow  front  from  either  the  north  (H^)  or  south  (H 3)  of 
their  sector.  It  was  the  subject's  duty  to  analyze  the  available 
Intelligence  and  decide  which  avenue  of  approach,  north  or  south,  the 
threat  force  would  take.  Appendix  A  demonstrates  this  scenario.  The 
Information  for  each  hypothesis  In  each  problem  was  presented  sequentially 
from  several  sources  of  Information  or  cues.  Each  source  conveyed 
Information  for  one  of  the  two  possible  hypotheses.  The  worth  of  each 
source  was  determined  by  two  dimensions.  Reliability  (i.e. ,  air 
reconnaissance  report,  reliability  *  .80)  and  dlagnostlclty  (I.e., 
destruction  of  obstacles  to  south,  0  «  .70).  Subjects  were  Instructed  to 
evaluate  the  Information  presented  and  decide  which  hypothesis  concerning 
future  threat  force  actions  was  most  likely  to  occur.  The  effects  of  five 
parameters  on  decision  accuracy  and  confidence  were  studied  within  the 
framework  of  descriptive  Model  B  presented  in  Figure  1.  Predictions  of 
these  effects  and  how  they  bear  on  the  model  of  information  Integration 
are  now  discussed. 

1,  Figures  2  and  3  illustrate  the  verbal  and  spatial  code  formats  of 
the  Information  display,  respectively.  It  was  predicted  that  the  spatial 
code  format  which  utilizes  dimensional  Integrality  would  enhance  decision 
accuracy.  This  prediction  as  noted  above  Is  based  both  upon  the  principle 
of  S-C  compatibility,  and  the  Integral  "configural "  nature  of  the 
rectangular  object  display. 


While  major  emphasis  of  the  present  experiment  was  placed  on  the 
distinction  between  verbal  and  analog  display  formats,  we  were  also 
interested  in  the  Influences  of  four  additional  decision  problem 
variables,  both  in  their  own  right,  and  as  they  might  modify  the  effect  of 
display  format.  These  are  described  as  follows. 


2.  The  time  available  to  process  each  cue  in  a  decision  problem  was 
varied.  The  main  effect  of  this  variable  was  cf  less  Interest  than  were 
the  Interaction  effects  of  this  variable  with  cue  coding.  We  were 
interested  In  how  any  advantages  of  the  spatial  display  might  be  modulated 
by  varying  Information  rate.  On  the  one  hand.  Increased  rate  produces  an 
Increased  degree  of  time-stress— presumably  a  detrimental  effect.  On  the 
other  hand,  this  may  be  balanced  by  the  fact  that  the  faster  rate  Imposes 
less  of  a  burden  on  working  memory  for  the  Integration  of  successive  cues. 
Because  of  these  two  counteracting  trends,  we  were  unable  to  predict  a 
priori  the  ultimate  effect  of  this  variable. 
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Model  A 

Information  Value 


Hyp  A  Hyp  B 


Cue  diagnosticity,  d.  *  -  SiB| 


Model  B 

Cue  Diagnosticity 


Hyp  A  Hyp  B 


Figure  1:  Decision  Models:  Model  A  represents  a  general  adding- 
multiplying  process;  cue  diagnosticity  equals  the  absolute  difference 
of  the  information  value  of  each  alternative  hypothesis  for  a  given 
cue;  Model  B  represents  a  simplification  in  that  only  the  cue  diag¬ 
nosticity  is  presented  for  the  hypothesis  having  the  greatest  infor¬ 
mation  value  S^. 
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CUE: 

ENEMY  ft I R  REC ONNfil SSftNCE  TO  NORTH 

DI AGN03T  I C I  T'i :  ATTACK  NURTH . 30 

RELIftBI LI  TV:  ISOLftTED  SCOUT  REPORT  ...50 


! 


Figure  2:  Information  itsplay  with  verbal  code  format. 
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Figure  3:  Information  display  with  spatial  code  format. 
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3.  The  problem  size  or  number  of  cues  presented  within  a  problem  was 
varied.  Two  different  effects  were  predicted.  First,  decision  accuracy 
was  expected  to  decrease  with  an  increased  problem  size  due  to  the  effects 
of  Increased  memory  load.  Secondly,  confidence  was  predicted  to  Increase 
with  an  Increased  problem  size.  This  direct  relationship  Is  clearly 
supported  In  a  vast  number  of  studies  but  Is  particularly  relevant  to 
within  subject  designs  (Kaplan  &  Major,  1973). 


4.  Cue  variability,  defined  as  the  direction  of  correlation  between 
cue  dlagnostlclty  and  reliability,  was  varied  In  this  experiment.  Cues 
having  a  negative  correlation  between  dlagnostlclty  and  reliability 
values,  which  we  call  high  variability  cues,  were  expected  to  be 
overestimated  In  the  spatial  format  condition.  As  described  above,  we 
predict  that  the  "perimeter  effect"  may  produce  a  bias  to  overweight  the 
Information  value  of  highly  eccentric  rectangles.  If  these  findings  bear 
on  our  study,  then  high  variability  cues  will  be  overestimated  in  the 
spatial  format  condition.  Correspondingly,  if  one  of  the  two  competing 
hypotheses  Is  supported  by  high  variability  cues,  the  subjective  value  of 
support  for  this  hypothesis  will  be  overestimated.  Because  the  perimeter 
effect  Is  not  relevant  to  the  verbal  code  format  condition,  an 
overestimation  bias  is  not  expected  in  this  condition.  Therefore,  a  code 
x  variability  interaction  is  expected. 


5.  Finally,  the  total  difference  in  information  presented  for  the 
competing  hypotheses  within  a  trial  was  varied  between  problems.  As 
stated  above,  the  model  of  information  Integration  presented  in  Figure  1 
Is  incomplete.  It  describes  the  procedure  for  summing  information  for 
each  hypothesis  over  successive  cues,  but  does  not  describe  how  the  final 
judgment  on  the  difference  of  Information  is  to  be  made.  The  preference 
and  ratio  models  discussed  above  are  relevant  to  this  final  judgment 
process.  The  preference  model,  where  preference  «  valuei  -  value2.  can 
clearly  predict  the  most  likely  hypothesis,  but  intuitively  does  not 
describe  a  confidence  judgment.  A  ratio  model  would  describe  confidence 
as  a  function  of  Vi/(Vi  +  V2).  Alternatively,  the  ratio  model  can 
describe  the  confidence  rating  but  not  the  Initial  decision  process  of 
determining  the  most  likely  hypothesis.  A  combination  of  both  models 

where  weighted  difference  - — .  Is  evaluated.  It 

was  predicted  that  confidence  will  be  dl recti yy related  to  the  weighted 
difference  factor.  Optimality  of  Information (Integration  can  be  judged  by 
the  extent  to  which  judged  confidence  covaries\  with  the  actual  weighted 


(or  absolute)  difference. 
Experiment  2 


V1 


V2 


L'ar  VV2 


VV2 


The  objective  of  this  experiment  was  to  determine  If  there  are  biases 
associated  with  processing  the  dlagnostlclty  and  reliability  stimulus 
dimensions.  Hence,  cues  of  high  dlagnostlclty  and  low  reliability  were 


Scott  &  Wickens 


13 


consistently  presented  for  one  hypothesis  and  cues  of  low  dlagnosticity 
and  high  reliability  presented  for  the  competing  hypothesis.  If  subjects 
tend  to  treat  reliability  as  a  categorical,  overweighted  variable  then  the 
hypothesis  supported  by  evidence  for  which  objective  reliability  is  low 
should  be  systematically  favored. 

Method:  Experiment  1 


Subjects 


Eight  undergraduate  students  at  the  University  of  Illinois,  four  male 
and  four  female,  volunteered  to  serve  In  this  experiment.  All  subjects 
haa  normal  or  corrected  vision  and  were  paid  $3.00  per  hour  for  their 
participation  in  each  of  the  three  days  of  testing. 

Apparatus 

Subjects  were  seated  in  a  light  and  sound  attenuated  booth  containing 
a  10  cm  x  8  cm  Hewlett-Packard  model  1330a  CRT  and  two  spring-return  push¬ 
button  keyboards.  The  keyboards  at  the  right  and  left  hands  had  two  and 
ten  buttons,  respectively.  Subjects  sat  approximately  90  cm  from  the 
display.  A  Digital  Equipment  Corporation  POP  11,  16  bit  computer  with  24K 
memory  was  used  to  generate  the  experimental  displays  and  record  subject 
performance. 


Task 


Figures  2  and  3  Illustrate  examples  of  the  cues  of  military 
Intelligence,  a  series  of  which  were  presented  to  subjects  sequentially. 
Successive  cues  alternated  In  support  of  the  two  different  possible 
courses  of  action  that  the  threat  force  might  take:  attack  North  or  attack 
South.  Subjects  were  Instructed  to  process  the  cues  utilizing  an 
adding-multiplying  model.  That  Is,  cue  valence  was  to  be  equal  to 
dlagnosticity  x  reliability  and  successive  cue  valences  were  to  be  summed 
In  support  of  their  respective  hypotheses.  Then  at  the  completion  of  the 
trial  subjects  were  prompted  to  as: 3ss  the  difference  In  support  of  the 
two  alternative  hypotheses  and  Indicate  which  was  the  most  likely  enemy 
course  of  action  using  the  two  button  keyboard.  Subjects  were  then 
presented  a  confidence  scale  ranging  from  0-9  and  anchored  by  "absolutely 
uncertain"  and  "absolutely  certain,"  respectively.  Figure  4  Illustrates 
this  confidence  scale.  Subjects  were  Instructed  to  assess  how  confident 
they  were  that  the  threat  force  would  execute  the  course  of  action  which 
the  subjects  had  judged  most  likely  given  the  available  information.  The 
ten  button  keyboard  was  used  for  this  response. 
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Figure  4:  Confidence  response  scale. 


Scott  &  Wlckens 


15 


Stimulus  Design  * 

Each  cue  consisted  of  &  statement  about  current  threat  force 
disposition  or  relevant  environmental  conditions  In  the  area  of 
operations.  Additionally,  the  source  of  the  Information  was  presented. 
Values  of  dlagnostlclty  (the  relevance  of  Information)  and  reliability 
(the  credibility  of  the  source)  were  also  presented  as  part  of  the  cue. 
The  following  factors  were  manipulated  orthogonally  In  the  experimental 
design. 


Coding.  Two  cue  formats  were  evaluated.  A  verbal  code  fomat  is 
presented  in  Figure  2.  The  alternative  spatial  code  design  utilizing 
dimensional  Integrality  Is  presented  In  Figure  3.  Cue  valence  Is 
represented  by  the  darkened  rectangle  where  height  Is  the  diagnostic  value 
and  width  Is  the  reliability  value.  Position  coding  is  utilized  to 
Indicate  which  of  the  two  alternative  hypotheses  is  supported  by  the  cue. 
Information  presented  above  the  abscissa  supports  a  threat  attack  to  the 
North  and  information  presented  below  supports  a  threat  attack  to  the 
South. 


Problem  size.  Three  levels  of  problem  size  were  used:  A  total  of 
six  cues,  three"Tn  support  of  each  hypothesis;  eight  cues  total,  four  for 
each  hypothesis;  and  10  cues  total,  five  for  each  hypothesis. 

Trial  variability  (VARIABILITY).  This  factor  concerns  the 
relationship  between  cue  diagnostlcity  and  reliability  dimensions. 

Problems  containing  only  cues  In  which  reliability  ana  diagnostlcity  are 
positively  correlated  are  designated  low  variability  cues.  Hence,  most  of 

the  rectangles  in  the  spatial  format  are  "squarish."  Cues  with  high 

diagnostlcity  and  low  reliability  or  low  diagnosticlty  and  high 

reliability  are  designated  high  variability  cues.  These  cues  have  the 
form  of  eccentric  rectangles  In  the  spatial  format.  The  matrix  presented 
In  Figure  5  depicts  the  respective  diagnostlcity  and  reliability  values 
associated  with  high  and  low  variability  cues. 


Trial  variability  has  two  levels.  In  the  low  case,  both  alternative 
hypotheses  are  supported  with  low  variability  cues.  In  the  high  trial 
variability  case  one  hypothesis  Is  supported  with  low  variability  cues  and 
the  second  hypothesis  Is  supported  with  high  variability  cues.  Hence,  If 
a  perimeter  bias  Is  present,  subjects  will  "overpredict"  the  second 
hypothesis.  Note  that  cue  variability  is  not  a  factor  but  a  stimulus 
condition.  Trial  variability  Is  the  experimental  factor  having  two  levels 
and  Is  dependent  upon  the  cue  variability  condition  presented  to  the  two 
different  alternative  hypotheses  during  a  trial. 
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Presentation  time  (TIME).  Two  levels  of  cue  presentation  time  were 
used.  Cues  were  presented  for  three  seconds  (fast)  or  five  seconds 
(slow).  Interstimulus  Interval  remained  constant  at  one  second. 

Weighted  difference.  The  weighted  difference  was  computed  by 
di vl di ng  the  absol ute  di f ference  of  support  presented  for  the  two 
different  hypotheses  by  the  total  support  presented  for  both  hypotheses. 
Hence,  weighted  difference  » 


n 

I 
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n 

£ 
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riA  diA  "  i^ir1B  di3 
n 

r . ,  d..  +  z  r .  _  d.„ 

iA  iA  ^_i  iB  iB 


x  100% 


Sets  of  problems  having  three  equally  spaced  levels  of  weighted  difference 
were  used;  5-10%,  15-20%,  and  25-30%. 


Design 

A  wlthln-subjects  design  was  employed  in  which  each  subject 
participated  in  all  experimental  manipulations  over  a  period  of  two 
sessions.  Each  session  lasted  approximately  one  hour  and  consisted  of  36 
trials.  Figure  6  illustrates  the  total  72  trial  conditions  presented  over 
two  sessions. 


Trial  order  within  a  session  was  blocked  by  variability  and  weighted 
difference  (2  x  3).  These  six  trials  were  presented  randomly  within  a 
block.  Additionally,  the  six  blocks  of  each  session  were  randomly  ordered 
for  each  subject.  The  time  factor  was  split  by  session  such  that  all  36 
trials  of  the  first  session  were  at  the  slow  level  and  all  36  trials  of 
the  second  session  were  at  the  fast  level. 


Procedure 


Subjects  were  run  individually.  One  practice  session  preceded  the 
two  experimental  sessions  and  lasted  90  minutes.  During  the  practice 
session  subjects  were  presented  the  tactical  scenario  described  earlier 
and  In  detail  In  Appendix  A.  Subjects  were  also  given  definitions  of  cue 
reliability  and  dlagnostlclty  and  examples  of  each  as  Illustrated  In 
Appendix  B.  Any  military  terminology  used  In  the  experiment  was 
thoroughly  explained.  Finally,  subjects  were  Instructed  on  the  use  of  the 
adding  model  for  Integrating  successive  cues  within  a  trial.  A  visual 
aid  Illustrated  in  Appendix  C  was  used  for  Instructing  this  process. 

During  the  remainder  of  the  practice  session,  approximately  30  minutes. 
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practice  trials  were  run.  All  subjects  achieved  a  minimum  of  five 
successive  correct  trials  during  this  period. 


Subjects  were  given  eight  practice  trials  at  the  start  of  each  of  the 
two  experimental  sessions.  A  five  second  Interval  between  trials  and  a  60 
second  rest  between  blocks  of  trials  were  Imposed  during  the  experimental 
sessions. 


At  the  conclusion  of  the  experiment  subjects  were  asked  to  explain 
how  they  Integrated  successive  cues  In  a  trial  and  how  their  confidence 
rating  was  related  to  the  Information  integration  process. 

Results:  Experiment  1 


Analysis  of  Experimental  Effects 

Accuracy.  Table  1  presents  a  summary  of  decision  accuracy  as  a 
function  of  each  of  the  five  Independent  variables  averaged  over 
replications.  Arc  sine  transformations  were  performed  on  accuracy 
percentages.  An  analysis  of  variance  performed  on  these  data  revealed 
significant  main  effects  on  decision  accuracy  for  three  of  the  five 
variables  studied.  No  interaction  effects  were  found  to  be  statistically 
significant,  and  hence  only  main  effects  are  shown  in  the  table.  A  large 
effect  was  found  for  coding  (t(l,7)  *  3.85,  £  <  .025).1  As  predicted,  the 
spatial  code  format  yielded  an  improvement  in  decision  accuracy  over  the 
verbal  format.  Thu  effect  of  trial  variability  on  decision  accuracy  was 
also  statistically  significant  (t(l,7)  =  2.17,  ?  <  .05).  Decision 
accuracy  was  best  In  the  low  variability  condition,  and  poorer  when  one 
hypothesis  was  supported  by  the  more  eccentric  rectangles.  Finally,  the 
main  effect  of  time  was  statistically  significant  (t ( 1 ,7)  *  4.27,  p  < 
.005).  Decision  accuracy  was  qreater  in  the  fast  presentation  condition 
(3  seconds)  than  in  the  slow  presentation  condition  (5  seconds). 

In  Interpreting  the  time  effect,  it  should  be  recalled  that  the  main 
effect  of  time  was  not  of  interest  In  this  study.  The  Interaction  effects 
of  time  were  of  primary  concern.  For  this  reason  the  two  levels  of  time 
were  split  between  sessions.  All  slow  conditions  were  run  in  the  first 
session  and  all  fast  conditions  were  run  In  the  second  session. 
Consequently,  the  time  affect  is  possibly  the  result  of  a  practice 
artifact.  It  is  also  possible  however  vhet  the  time  manipulation  resulted 
in  memory  effect  rather  t>;an  3  time  stress  effect.  That  Is,  at  the  slower 
5  sec  cue  presentation  time,  the  loss  of  information  about  each  cue  due  to 
memory  accounts  for  a  large  portion  of  the  process  limitations.  It  Is 
therefore  plausible  th  t  a  lesser  degree  of  memory  decay  can  account  for 
the  Increase  In  decision  accuracy  at  the  fast  cue  presentation  time. 


H-tests  rather  than  F-tests  were  used  to  examine  the  three  two-level  main 
effects . 
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Trial  Variability 

Coding 

Time 

Weighted  Difference 
Set  Size 


Table  1 

Percent  Accuracy 


Low 

High 

97.2* 

92.3* 

Spatial 

Verbal 

97.6* 

92.0* 

Slow 

Fast 

93.4* 

96.2* 

5-10* 

15-20* 

94.2* 

95.3* 

6 (total) 

8 (total) 

95.3* 

95.3* 

25-30* 

94.8* 

10( total ) 
93.8* 
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Confidence.  Raw  confidence  data  were  transformed  using  the  following 
algorithm:  If the  decision  response  Is  correct  then  absolute  confidence  « 
10  +  confidence  racing ;  If  the  decision  response  Is  Incorrect  then 
absolute  confidence  «  10  -  confidence  rating.  This  algorithm  adjusts  the 
range  of  confidence  from  1  to  19.  This  transformation  Is  made  to  penalize 
subjects  more  heavily  If  they  made  an  error  (chose  the  Incorrect 
hypothesis)  when  they  were  extremely  confident  of  being  correct.  Since 
roughly  95%  of  the  responses  were  correct,  most  of  the  transformed  values 
are  greater  than  10.  Table  2  presents  the  data  summary  for  the 
transformed  confidence  scores. 


A  six-way  repeated  measures  ANOVA  (code  x  weighted  difference  x  time 
x  problem  size  x  variability  x  replication)  was  performed  on  the 
transformed  confidence  data.  The  main  effects  of  code  and  problem  size 
were  not  significant  (F(l,7)  *  2.47,  £  <  .16)  and  (F(2,14)  **  0.78,  p  « 
.48),  respectively.  Significant  effects  were  found  for  evidence,  time  and 
trial  variability,  and  for  the  code  x  time  and  the  problem  size  x  time 
Interactions. 


A  very  large  effect  of  weighted  difference  was  found  (F(2,14)  * 
25.10,  £  <  .0001).  This  effect  Is  particularly  important  for  two 
reasons.  First,  it  demonstrates  that  subjects  are  extracting  more 
Information  when  more  Information  is  available.  In  addition.  It 
demonstrates  that  subjects  are  using  the  confidence  response  scale  as  an 
analog  for  evidence. 


Figure  7  portrays  the  two  main  Interactions  on  the  confidence 
variable  that  were  observed.  The  figure  shows  confidence  with  the  verbal 
display  on  the  left  panel  and  the  spatial  on  the  right.  The  abscissa 
within  each  panel  represents  the  effects  of  problem  size,  and  the  two 
functions  are  those  for  the  slow(dashed  line)  and  fast(sol1d  line)  speed, 
respectively.  Each  data  point  Is  collasped  across  the  three  levels  of 
weighted  difference. 


Across  both  panels  we  find  that  the  faster  speed  generated  reliably 
higher  confidence  (FI ,7  «  5.88,  p  <  .05).  While  as  noted  In  discussing 
the  analogous  effect  on  accuracy,  this  might  reflect  the  Influence  of 
practice.  It  might  also  be  related  to  the  effect  of  memory  decay.  The 
faster  rate  produces  a  smaller  loss  of  Information  during  Integration  and 
therefore  warrants  higher  confidence  In  the  final  resDonse.  This 
Interpretation  Is  supported  to  some  degree  by  the  reliable  Interaction 
between  speed  and  problem  size  (F(2,14)  =  5.97,  £  <  .01).  Examining 
Figure  7  suggests  that  the  major  source  of  this  interaction  Is  between  the 
8  and  10  cue  problems.  Increasing  problem  size  from  8-10  increases 
confidence  at  the  fast  rate  but  diminishes  It  at  the  slow  rate.  This 
suggests  that  two  factors  may  be  operating  with  changes  In  response  speed. 
When  the  rate  Is  slow,  a  good  deal  of  forgetting  of  earlier  cues  takes 
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Table  2 


Absolute  Confidence 


Low 

14.95 

High 

13.37 

Verbal 

13.92 

Spatial 

14.41 

Slow(5  sec) 
13.93 

Fast(3  sec) 
14.40 

Low  (5 -10%) 
12.76 

Med  (15-20%) 
14.14 

H1gh(25-3Q%) 

15.60 

6(total ) 

14.0 

8(total ) 
14.26 

10( total ) 
14.23 

Figure  7:  Code,  speed,  and  problem  size  effects 
on  confidence. 
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place  and  so,  with  more  cues  present  more  is  forgotten  and  confidence  is 
t  .treby  reduced.  This  is  substantiated  by  the  marked  loss  in  accuracy  of 
this  particular  condition,  there  being  twice  as  many  errors  here  as  in  any 
of  the  other  conditions.  When  the  rate  is  fast  there  is  less  opportunity 
for  decay.  Here  more  cues  lead  to  increased  confidence  because  of  the 
subjective  belief  that  the  evidence  is  more  reliable. 


An  additional  interaction  effect  of  time  x  code  was  found  to  be 
statistically  significant  (F(l,7)  =  6.34,  p  <  .04)  and  is  illustrated  in 
Figure  7  as  well.  The  verbal  display  shows'  a  greater  increase  in 
confidence  with  faster  speed  than  does  the  spatial  which  appears  to  be 
little  affected  by  confidence  at  all,  particularly  with  the  small  sized 
problems.  There  are  two  likely  interpretations,  tfie  first  of  which 
corresponds  to  the  practice  artifact  interpretation  of  the  main  effect  of 
time.  The  Information  integration  task  is  process  limited  and  aided  by 
the  integral  spatial  display.  The  benefits  of  the  spatial  format  over  the 
verbal  format  decrease  with  practice  (the  faster  speed)  as  the  process 
limitations  are  decreased.  Phrased  in  terms  of  our  initial  prediction, 
this  interpretation  suggests  that  the  S-C-R  incompatibility  between  the 
verbal  cue  format  and  spatial  central  processing  code  is  somewhat  overcome 
with  practice.  Alternatively,  all  or  part  of  the  process  limitations  may 
be  memory  related  and  the  effects  of  the  Integral  spatial  display  are 
largest  when  the  greatest  demands  are  placed  on  memory,  which  Is  likely  to 
occur  at  the  slow  presentation  rate. 


A  large  main  effect  of  trial  variability  was  obtained  (F(l,7)  = 

22.47,  p  <  .005).  Confidence  ratings  were  higher  in  low  variability 
trials  than  In  nigh  variability  trials.  This  finding  was  predicted  in  the 
spatial  code  condition  to  result  from  an  overestimation  of  high 
variability  cues  if  perimeter  estimates  biased  the  computation  of  cue 
valence.  In  the  present  study,  24  of  the  36  high  variability  trials  were 
designed  such  that  an  overestimation  of  high  variability  cues  would  result 
in  a  confidence  decrement.  That  is,  the  incorrect  hypothesis  was 
supported  by  high  variability  cues.  The  code  x  trial  variability 
interaction  was  not  statistically  significant  (F(l,7)  =  .01,  p  =  .94). 

This  indicates  that  if  high  variability  cues  are  overestimated  there  is 
not  a  differential  effect  between  the  spatial  code  and  verbal  code  format. 
Therefore,  the  overestimation  bias  appears  to  be  operating  In  both  the 
spatial  and  verbal  code  conditions,  and  so  could  net  result  from  a 
"perimeter  effect."  Hence,  it  appears  that  high  trial  variability  simply 
makes  the  Information  Integration  more  difficult,  with  a  resulting  loss  in 
both  accuracy  and  confidence. 


Regression  analysis.  The  large  effect  of  weighted  difference  on 
confidence  discussed  above  demonstrates  that  subjects  are  treating  the 
confidence  scale  as  an  analog  of  an  evidence  or  information  factor.  We 
assume  that  the  good  decision  maker  Is  one  who  gains  confidence  as  a 
greater  difference  in  evidence  between  the  competing  hypotheses  exists. 
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Additionally,  the  decision  maker  Is  "absolutely  uncertain"  when  equal 
evidence  Is  presented  for  both  competing  hypotheses.  The  following 
analysis  Investigates  the  relationship  between  evidence  and  confidence 
ratings,  and  the  effects  of  our  other  Independent  variables  on  this 
relationship. 


In  order  to  capture  a  measure  of  how  confidence  varied  with  evidence, 
regression  analyses  of  confidence  on  the  three  levels  of  weighted 
difference  were  performed  for  each  subject  In  each  of  the  24  trial 
condition  cells  (code(2)  x  t1me(2)  x  problem  s1ze(3)  x  trial 
var1ab111ty(2)).  A  five-way  repeated  measures  ANOVA  (code  x  time  x 
problem  size  x  trial  variability  x  replication)  was  performed  on  the 
slope.  Intercept,  and  residual  mean  square  statistics. 


The  main  effect  of  coding  was  our  primary  interest  In  the  ANOVA 
performed  on  slope.  We  have  suggested  that  the  spatial  code  format  Is 
more  S-C  compatible  with  an  analog  confidence  scale.  A  difference  in 
sensitivity  or  slope  would  therefore  be  of  interest.  This  effect, 
however,  was  not  statistically  significant  (F(l,7)  =  .25,  £  =  .63),  nor 
were  any  other  main  effects  found  to  be  significant. 


The  intercept  statistic  was  Interpreted  as  a  measure  of 
overconfidence.  That  Is,  we  interpret  a  positive  confidence  estimate  when 
the  support  for  both  hypotheses  Is  extrapolated  to  be  equal  as  a  measure 
of  overconfidence.  Optimal  decision  makers  are  "absolutely  uncertain"  in 
this  situation.  This  ANOVA  showed  a  significant  main  effect  of  trial 
variability  (F( 1 ,7)  *  8.44,  £  <  .02).  The  mean  intercept  of  the  low 
variability  trials  (2.10)  was  greater  than  that  of  the  high  variability 
trials  (0.78),  a  difference  of  1.32.  Amain  effect  of  trial  variability 
on  confidence  was  described  earlier  and  again.  Table  2  Illustrates  that 
the  mean  confidence  rating  of  low  trial  variability  was  1.58  units  greater 
than  the  mean  confidence  rating  of  the  high  trial  variability  condition. 

We  can  now  Interpret  this  difference  as  related  to  a  bias  In  assigning 
confidence  between  the  two  cond1t1ons--a  bias  that  Is  somewhat  unrelated 
to  the  actual  differences  in  evidence. 


The  final  ANOVA  was  performed  on  the  residual  mean  square  statistic. 
We  Interpret  the  residual  mean  square  as  a  measure  of  "goodness  of  fit"  of 
the  regression  line.  Any  conditions  having  significantly  large  residual 
mean  square  values  would  be  an  Indication  that  performance  was  non-optlmal 
or  possibly  that  the  weighted  difference  model  was  Inappropriate  as  a 
descriptive  model  of  confidence  rating  In  this  condition.  The  ANOVA, 
however,  showed  no  significant  effects. 
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It  is  important  to  note  that  the  absence  of  reliable  effects  in  the 
regression  analyses  may  result  in  part  from  the  lack  of  power  associated 
with  the  three  ANOVA’s.  Each  individual  regression  plot  of  each  cell, 
from  which  the  statistics  were  derived,  had  only  three  data  points.  It  is 
quite  likely  then  that  a  single  aberrant  data  point  could  drastically 
influence  the  regression  slope  producing  a  high  degree  of  variability  in 
the  raw  data.  An  alternative  approach,  in  which  data  are  averaged  across 
subjects  before  computing  the  regression  equation  will  be  described  below. 


Order  effects.  If  changing  the  order  of  stimulus  presentation  in  a 
sequential  integration  task  results  in  an  altered  response,  then  in 
general ,  order  effects  are  present.  There  are  many  possible  causes  of 
order  effects.  Primacy  effects  are  found  when  earlier  stimuli  in  a  serial 
integration  task  are  given  relatively  more  weight  than  later  stimuli. 

This  is  a  manifestation  of  the  phenomenon  of  anchoring  in  which  an  initial 
hypothesis  is  accepted,  based  upon  the  first  arriving  evidence,  and  is 
then  held  more  tenaciously  than  warranted  when  evidence  arrives  to 
disconfirm  it  (Kahneman  &  Tversky,  1973;  Lopes,  1982),  Correspondingly, 
recency  effects  are  found  when  later  stimuli  are  given  more  weight  than 
earlier  stimuli.  The  most  common  explanation  of  the  recency  effect  is 
that  the  earlier  stimuli  are  given  less  weight  due  to  a  memory  loss.  The 
recency  effect  seems  particularly  relevant  to  this  study.  This  is  because 
the  role  of  memory  loss  was  identified  as  a  possible  cause  of  of  the 
significant  interaction  between  time  and  problem  size.  This  was 
attributed  to  the  influence  of  memory  loss  in  the  10  cue,  T(slow) 
condition. 


In  the  present  study  the  cues  systematically  alternated  in  favor  of 
one  then  the  other  hypothesis.  The  order  of  alternation  was  balanced  so 
that  on  half  of  the  trials  ("correct  first")  the  evidence  concerning  the 
correct  hypothesis  was  presented  as  the  first  cue  and  that  concerning  the 
incorrect  as  the  last  cue.  On  the  other  half  ("correct  last"),  evidence 
for  the  incorrect  hypothesis  was  presented  first  and  for  the  correct  was 
on  the  last  cue.  Hence,  if  primacy  were  a  dominant  factor,  then  accuracy 
(and  confidence)  on  correct-first  trials  should  be  greater  than  on 
correct-last  trials.  On  the  other  hand,  if  information  integration  was 
dominated  by  recency,  then  correct-last  trials  should  be  favored.  A 
two-way  repeated  measures  ANOVA  was  performed  on  the  confidence  measures, 
and  this  effect  was  not  found  to  be  statistically  significant  (F(l,7)  = 
1.24,  £=  .30).  The  absence  of  an  effect  here  does  not  necessarily  mean 
that  primacy  and  recency  were  not  shown.  It  does  imply  that  if  such 
effects  were  present  they  probably  balanced  each  other  in  their  magnitude. 
The  problems  were  not  ordered  in  such  a  way  as  to  choose  between  these 
particular  hypotheses. 
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Model  of  Information  Integration 

Processing  strategies  (self-report).  Subjects  were  asked  to  explain 
their  information  integration  strategy  for  the  entire  decision  task.  More 
specifically,  three  processes  were  of  interest:  the  assessment  of  the 
valence  of  each  individual  cue,  the  integration  of  successive  cues,  and 
the  final  assessment  of  confidence. 


All  subjects  reported  using  a  multiplying  model,  i.e.,  diagnosticity 
x  reliability,  to  assess  cue  valence.  The  strategies  for  Integration  of 
successive  cues  reported  by  the  subject  were  less  unanimous.  Three 
subjects  reported  using  an  adding  model  in  which  running  totals  were  kept 
throughout  a  trial  for  each  alternative  hypothesis.  Four  subjects 
reported  using  a  “random  walk"  model.  Successive  cues  were  added  (or 
subtracted)  to  a  single  running  balance.  This  model  intuitively  places 
less  demands  on  working  memory.  Additionally,  one  subject  reported  using 
a  somewhat  primitive  heuristic  in  which  the  “random  walk"  model  was 
utilized,  but  was  further  simplified  by  using  fingers  and  fractions  of 
fingers  to  represent  the  running  balance.  Finally,  subjects  were  asked  to 
explain  their  strategy  for  assessing  confidence.  Five  subjects  stated 
that  they  used  a  weighted  difference  strategy.  Three  subjects  claimed  to 
use  only  the  absolute  difference  of  information  presented  for  the  two 
alternative  hypotheses. 


Regression  analysis.  The  results  of  the  self-report  exercise  have 
led  us  to  take  a  closer  look  at  the  model  subjects  used  for  information 
integration.  Three  subjects  reported  using  a  preference  model  where  net 
differences  =  value]  -  value2,  in  favor  of  the  weighted  difference  model. 
These  two  models  are  qualitatively  very  different.  Recall  that  In  this 
experiment  successive  cues  alternate  In  support  of  the  two  alternative 
hypotheses.  An  equal  number  of  cues  are  presented  for  each  hypothesis. 

The  weighted  difference  model  does  not  have  directional  constraints.  If 
one  hypothesis  is  strongly  supported  and  two  additional  cues  of  equal 
weight  are  presented  for  each  alternative  hypothesis,  so  that  the  net 
evidence  of  these  two  cues  is  zero,  then  a  confidence  decrement  Is 
predicted.  This  makes  the  weighted  difference  model  qualitatively  similar 
to  an  averaging  model.  Alternatively,  the  net  difference  model  has 
directional  constraints  akin  to  the  Bayesian  model.  This  model  does  not 
predict  a  decrease  in  response  when  additional  neutral  or  mild  evidence  is 
presented. 


Since  we  did  not  "track"  the  confidence  judgment  of  our  subjects  as 
they  progressed  through  the  sequence  of  cues  (only  a  single  rating  was 
given  after  all  cues  were  presented),  we  were  unable  to  distinguish 
between  the  models  on  the  basis  of  the  impact  of  weak  or  neutral  evidence. 
Instead,  we  employed  a  means  of  model  testing  that  capitalized  on  the  fact 
that  we  had  three  different  problem  sizes,  defining  three  different  levels 
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of  total  evidence.  As  described  In  the  methods  section,  the  three  levels 
of  evidence  of  our  cues  were  designed  to  create  equally  spaced  equal 
intervals  of  weighted  difference  (evidence  for  the  favored 
hypotheses/total  evidence).  If  subjects  then  used  this  variable  to 
calibrate  their  subjective  confidence,  then  confidence  should  vary 
linearly  with  net  difference.  On  the  other  hand,  the  three  objective 
evidence  levels  were  not  equally  spaced  In  terms  of  the  net  difference 
(equal  values  of  relaive  differences  will  produce  different  values  of  net 
difference  If  the  total  amount  of  evidence  varies).  Hence,  if  subjects 
used  the  more  optimal  net  difference  strategy,  the  regression  of 
subjective  confidence  onto  relative  confidence  should  show  a  non-linear 
component. 


Regressions  of  confidence  on  net  difference,  and  confidence  on 
percent  difference  were  performed  for  each  subject  in  24  different  trial 
conditions  (code(2)  x  time(2)  x  problem  size(3)  x  trial  variabil ity(2)). 

In  order  to  determine  if  subjects  were  using  one  mode  of  information 
integration  over  another,  the  objective  evidence  for  the  favored 
hypothesis  on  each  trial  was  computed  in  two  different  fashions.  First, 
as  the  net  evidence  in  favor  of  one  over  the  other;  second,  as  the  ratio 
of  this  net  difference  to  the  total  amount  of  evidence  presented  for  both 
hypotheses  on  the  trial.  A  two-way  (decision  model  x  replication) 
repeated  measures  ANOVA  was  performed  on  the  correlation  statistics  of  the 
two  different  regressions.  Correlations  were  interpreted  as  a  degree  of 
relationship  measure  between  confidence  ratings  and  the  respective  model 
of  integration.  No  significant  effect  of  correlation  for  the  two 
different  models  was  found  (F(l,7)  =  .77,  £  =  .41). 


One  final  analysis  was  conducted  to  evaluate  the  difference  between 
these  two  models.  Figure  8  illustrates  the  regression  of  the  mean 
confidence  rating  on  net  difference,  and  below  that  the  regression  of 
confidence  on  percent  or  relative  difference.  The  mean  percent  difference 
was  computed  in  each  of  the  three  levels  of  weighted  difference  and 
plotted  against  the  respective  mean  confidence  ratings  over  subjects.  A 
similar  procedure  was  used  to  plot  confidence  on  net  difference.  Note 
that  the  ordinate  (confidence)  values  are  equivalent  between  the  two 
graphs  (despite  the  expanded  scale  on  the  bottom),  but  the  abscissa  values 
are  equally  spaced  on  the  weighted  difference  scale  (as  we  had  created 
them),  while  they  are  not  on  the  net  difference  scale. 


As  Figure  8  indicates,  both  models  seem  to  do  an  excellent  job  of 
predicting  confidence  with  averaged  group  data.  There  is  however,  an 
apparently  better  relationship  depicted  in  the  plot  of  confidence  on  net 
difference.  This  is  of  course,  only  a  cursory  analysis.  Recall,  however, 
that  weighted  difference  and  net  difference  are  highly  related.  This 
small  difference  may  therefore  yield  substantial  evidence  in  favor  of  the 
more  optimal  net  difference  model. 
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The  data  were  further  broken  down  and  plotted  in  the  same  fashion 
separately  for  the  two  groups  that  differed  in  the  strategies  that  they  had 
reported  using.  These  self-reports  proved  to  be  fairly  accurate 
indicators  of  the  integration  strategy  revealed  in  the  data.  The  three 
subjects  who  reported  that  they  used  the  net  difference  strategy  showed  an 
almost  perfect  linear  relation  between  net  difference  and  subjective 
confidence.  The  fit  was  much  poorer  with  the  weighted  difference.  The 
behavior  of  the  five  self-reported  weighted  difference  subjects  on  the 
other  hand  seemed  to  reflect  a  compromise  between  the  two  strategies. 

Method;  Experiment  2 

The  design  of  Experiment  2  was  similar  to  that  of  Experiment  1  with 
three  exceptions.  First,  the  problem  size  remained  constant  at  eight 
total  cues,  four  for  each  alternative  hypothesis.  Second,  the  display 
time  was  held  constant  at  the  slow  (5  seconds)  presentation  rate. 

Finally,  a  new  level  of  trial  variability  was  implemented.  All  trials  in 
this  experiment  utilized  hi  oh  variability  cues  for  both  alternative 
hypotheses.  More  specifically,  cues  of  high  diagnosticity  and  low 
reliability  were  presented  for  one  hypothesis  and  cues  of  low 
diagnosticity  and  high  reliability  were  presented  for  the  alternative 
hypothesis.  Formally,  in  terms  of  any  model  of  information  integration 
neither  hypothesis  should  be  favored  by  this  bias  since  the  valence  of 
each  cue  should  be  insensitive  to  the  relative  contributions  of 
reliability  and  diagnosticity  (Johnson,  Cavanagh,  Spooner,  &  Samet,  1973). 
However,  if  subjects  tended  to  treat  one  variable  different  from  another 
(i.e.,  to  discount  differences  in  reliability  applying  the  "as  if" 
heuristic;  Wickens,  1983),  then  biases  should  become  evident.  The  trials 
of  Experiment  2  were  configured  so  that  the  incorrect  hypothesis  was 
always  favored  by  the  cues  of  low  reliability.  Hence,  to  the  extent  that 
subjects  overestimated  reliability  these  inflated  cue  valences  should  bias 
them  toward  picking  the  incorrect  hypothesis — i.e.,  their  error  rate 
should  increase. 


Results :  Experiment  2 

Accuracy.  Table  3  presents  accuracy  data  averaged  over  subjects  and 
over  display  format.  The  accurrcy  measure  for  each  subject  was  determined 
by  the  proportion  of  errors  made  with  the  two  opportunities  (one  with  each 
display  type)  in  a  given  condition.  Hence,  raw  accuracy  values  for  a 
given  subject  were  either  0,  .5,  or  1.  Arc  sine  transformations  were 
performed  on  accuracy  percentages.  A  two-way  (weighted  difference  x 
replication)  repeated  measures  ANOVA  was  conducted.  As  in  Experiment  1, 
the  main  effect  of  weighted  difference  was  statistically  significant 
(F( 1,7)  =  5.86,  £  <  .01).  It  is  evident  that  decision  accuracy  was 
poorest  in  the  trials  of  low  weighted  difference.  As  described  above,  the 
evidence  in  all  trials  favored  the  hypothesis  having  low  diagnosticity  and 
high  reliability.  Therefore,  the  fact  that  in  44%  of  all  low  difference 
trials  the  subject  chose  the  Incorrect  hypothesis,  supported  by  high 
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diagnostlclty  and  low  reliability.  Indicates  that  reliability  tends  to  be 
overweighted  to  a  greater  extent  than  diagnostlclty.  The  44%  error  rate 
In  this  condition  is  significant  in  that  it  compares  with  an  error  rate  of 
less  than  6%  in  the  corresponding  condition  of  Experiment  1,  when 
conditions  were  not  created  to  Induce  a  bias.  While  the  present  bias 
might  be  expected  to  be  present  at  all  three  levels  of  weighted 
difference.  Its  presence  here  only  In  the  low  weighted  difference 
condition  Is  not  too  surprising  because  a  constant  bias  would  have  a 
relatively  smaller  effect  on  the  greater  weighted  difference. 

Confidence.  Because  we  have  demonstrated  that  the  confidence  scale 
Is  used  as  an  analog  of  evidence  we  expect  the  dimensional  bias  for 
reliability  to  decrease  confidence  at  all  three  levels  of  weighted 
difference.  That  is,  since  the  cues  in  support  of  the  incorrect 
hypothesis  (high  diagnosticity  and  low  reliability)  are  overestimated, 
this  would  yield  a  decrease  in  confidence  as  predicted  by  the  weighted 
di f ference  mod*1 . 


.  dgments  of  confidence  were  transformed  as  in  Experiment  1.  Table  3 
presents  these  data  averaged  across  subjects.  A  three-way  (code  x 
weighted  difference  x  replication)  ANOVA  was  conducted.  The  main  effect 
of  code  and  the  code  x  weighted  difference  interaction  were  not 
significant,  supporting  the  interpretations  made  for  the  absence  of  these 
effects  in  the  accuracy  ANOVA.  The  main  effect  on  weighted  difference  was 
very  large  (F(7 ,14)  =  18.84,  £  <  .0001).  This  demonstrates  that,  as  in 
Experiment.  ,  subjects  are  becoming  increasingly  confident  as  a  greater 
difference  in  evidence  between  the  competing  hypotheses  exists.  Most 
significantly  for  the  hypothesis  under  consideration,  the  mean  confidence 
rating  “.his  experiment  was  12.69  on  the  experimental  confidence  scale. 
The  ove:  confidence  for  the  corresponding  eight  cue  conditions  in 

Experlmer  l  was  14.16.  The  decrease  in  confidence  in  Experiment  2 
supports  prediction  of  a  dimensional  overestimation  bias,  i.e.,  cues 

of  high  diagnostlclty  and  low  reliability  are  overestimated,  pulling  the 
integration  of  information  away  from  the  correct  hypothesis  and  towards 
the  neutral  \  int,  and  hence  producing  a  final  judgment  of  reduced 
confidence. 


Di  scussion 


Englneeri ng  Applications 

Code  effects.  The  results  of  Experiment  1  indicate  that  decision 
accuracy  is  enhanced  with  the  integrated  spatial  display  format.  Two 
interpretations  of  this  finding  have  been  discussed.  We  initially 
predicted  that  the  integral  dimensions  of  the  spatial  code  format  display 
would  simplify  the  integration  process  and,  therefore,  evihance  decision 
accuracy,  especially  under  conditions  of  time  stress.  This  latter 
interpretation  Is  now  uncertain.  The  code  x  time  interaction  effect 
demonstrates  that  the  spatial  code  format  has  a  greater  benefit  when 
information  is  presented  at  a  relatively  slower  rate.  Alternatively,  the 
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Table  3 

Percent  Accuracy 

Weighted  Difference 


5-101! 

15-20% 

0.563 

0.938 

Absolute  Confidence 

Weighted 

Difference 

5-10% 

15-20% 

Verbal 

11.00 

12.25 

Spatial 

10.38 

13.50 
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spatial  code  format  was  predicted  to  be  more  compatible  with  the  analog 
nature  of  the  Internal  scale  of  confidence  maintained  In  working  memory. 
Because  of  the  greater  degree  of  compatibility,  the  Internal  scale  Is  more 
easily  revised  and  Is  less  sensitive  to  the  decay  of  positioning  with  the 
Increasing  Interval  between  successive  cues.  The  spatial  display  was 
predicted  to  enhance  the  encoding  process  and  have  positive  effects  on 
working  memory.  The  code  x  time  Interaction  effects  support  this 
Interpretation. 


Speed  effects .  It  was  our  Intention  to  evaluate  the  effect  of  the 
spatial  code  format  under  conditions  of  time  stress.  Our  results 
Indicate,  however,  that  the  time  manipulations  <n  this  study  produced  more 
of  an  effect  on  memory  loss  effect  than  of  time  stress.  Both  the  code  x 
time  and  problem  size  x  time  Interaction  have  been  Interpreted  in  terms  of 
memory  loss,  although  these  could  also  result  from  practice  effects 
because  the  slow  speed  was  presented  first. 

Non-optimality  l£  Decision  Making 

In  addition  to  their  human  engineering  Implications,  the  results  of 
the  present  experiment  bear  as  well  on  certain  cognitive  phenomena  In 
Information  integration  and  decision  making.  Three  of  these  phenomena 
will  be  described. 


1.  Models  of  information  integration.  The  consistent  effects  of 
weighted  difference  on  confidence  In  both  experiments  demonstrate  the 
ability  of  the  subjects  to  extract  more  evidence  and  therefore  Increase 
their  confidence  as  more  diagnostic  evidence  is  presented.  Five  subjects 
did  In  fact  report  using  a  weighted  difference  model  for  confidence 
judgment.  On  the  other  hand,  three  subjects  claimed  to  consider  the  net 
difference  of  Information  presented  for  the  two  competing  hypotheses.  It 
Is  unclear  as  to  which  of  these  two  models,  weighted  difference  or  net 
difference,  is  the  best  predictor  of  subjective  confidence.  An  ANOVA 
performed  on  the  correlations  of  both  models  and  confidence  was 
Inconclusive.  A  comparison  of  the  regression  plots  of  confidence  on  both 
of  these  models  does,  however,  suggest  that  the  net  difference  Is  the 
better  overall  predictor  of  confidence  for  group  data.  The  plot  for  the 
three  subjects  who  claimed  to  use  the  net  difference  was  perfectly  linear 
with  this  variable.  The  plot  for  the  remaining  five  subjects  could  be 
equally  well  accounted  for  by  either  of  the  two  models.  Since  the  net 
difference  model  Is  more  appropriate  in  the  Bayesian  type  inference  task 
used  here  than  Is  the  weighted  model  (reflecting  an  averaging  process),  we 
conclude  that  subjects  In  the  present  paradigm  showed  tendencies  toward 
this  form  of  optimal  behavior.  As  noted  of  course,  we  were  unable  to 
assess  the  different  models  directly  in  terms  of  the  Impact  of  different 
cues  presented  in  sequence.  Future  research  will  address  this  issue. 
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2.  Individual  cue  values.  Examining  the  Integration  of  the  Individal 
dimensions  of  rel  iabTTlty  ancT  dlagnostlclty  in  a  finer  grain  revealed  two 
further  effects.  Experiment  l  demonstrated  that  a  negative  correlation 
between  these  variables  (producing  for  the  spatial  display  an  Increase  In 
shape  variability)  reduced  both  the  accuracy  and  confidence  of  prediction. 
The  fact  that  this  reduction  was  a  main  effect  that  did  not  differ  between 
the  verbal  and  spatial  formats  suggests  that  the  source  of  difficulty  was 
not  the  physical  variability  In  the  shape  of  the  rectangles,  but  rather 
was  related  to  problems  encountered  In  Integrating  negatively  correlated 
data. 


Individual  cue  values  were  also  examined  in  Experiment  2  whose  data 
suggested  that  subjects  tended  to  over-value  low  levels  of  reliability, 
thereby  reducing  both  their  accuracy  and  confidence,  relative  to  the 
values  observed  in  Experiment  1.  These  results  are  Important  In  that  they 
Indicate  that  subjects  truly  did  respond  to  the  meaning  of  the  two  cues 
and  did  not  simply  treat  them  as  arbitrary  numbers.  Had  arbitrary  numbers 
been  combined  there  would  be  little  reason  for  subjects  to  treat  one 
differently  from  the  other.  In  the  spatial  condition  it  is  of  course 
possible  that  the  asymmetry  was  related  to  the  physical  dimensions.  The 
same  data  would  have  been  produced  had  subjects  overestimated  the  base  of 
the  rectangles  (depicting  reliability)  relative  to  the  height.  Yet  two 
factors  indicate  that  this  did  not  occur.  On  the  one  hand,  this  bias 
would  be  contrary  to  the  bias  typically  observed  in  the 
horizontal -vertical  illusion  In  which  vertical  segments  are  overestimated 
in  length  relative  to  horizontal  ones.  On  the  other  hand,  since  the  main 
effect  of  code  was  not  significant  In  Experiment  2,  this  would  suggest  an 
equal  loss  of  judgment  for  both  verbal  and  spatial  displays,  indicating 
that  the  source  was  related  to  cognitive  information  integration  and  not 
to  perceptual  display  biases.  Thus  it  appears  that  the  subjects  were 
processing  the  reliability  measure  as  Vf  its  value  were  closer  to  that  of 
the  larger  dlagnosticity  value.  The  general  finding  that  differences  in 
reliability  are  Ignored  or  "unitized"  in  multiple  source  information 
integration  tasks  has  been  reported  in  a  number  of  other  Investigations 
(see  Wlckens,  1983  for  a  summary).  The  demonstration  of  the  "as  If" 
heuristic  here  is  consistent  with  those  findings. 


3.  Serial  effects.  The  problem  size  or  number  of  cues  within  a  trial 
did  not  have  an  effect  on  confidence.  This  is  contrary  to  many  findings 
of  similar  studies  with  serial  Integration.  The  absence  of  a  problem  size 
effect  is  Indicative  of  optimal  performance.  If  subjects  have  evaluated 
information  to  produce  a  tentative  confidence  rating,  and  then  are 
presented  additional  information,  they  should  not  change  their  confidence 
rating  unless  the  weighted  difference  of  information  changes. 
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Another  Indicator  of  optimal  performance  was  the  absence  of  order 
effects.  Recall  that,  In  accordance  with  our  experimental  design,  one 
half  of  the  trials  presented  the  first  cue  of  the  trial  In  favor  of  the 
most  likely  hypothesis,  while  the  last  cue  was  In  favor  of  the  least 
likely  hypothesis.  Conversely,  on  the  other  half  of  the  trial  conditions, 
the  last  cue  of  the  trial  was  presented  In  favor  of  the  most  likely 
hypothesis,  while  the  first  cue  was  not.  Thus,  a  greater  mean  confidence 
In  the  former  or  latter  trial  conditions  would  Indicate  a  primacy  or 
recency  effect,  respectively.  We  did  not  find  a  significant  effect 
although  this  must  be  Interpreted  with  caution.  It  is  certainly  possible 
that  neither  effect  Is  present,  and  hence  theANOVAdld  not  show  a 
significant  effect.  On  the  other  hand,  both  primacy  and  recency  may  have 
been  operating  and  simply  cancelled  each  other  out.  In  future  work,  an 
experimental  design  which  does  not  employ  a  strictly  alternating  sequence 
of  cue  presentation  will  be  used  to  clarify  this  somewhat  ambiguous 
Interpretation. 

Implications  and  Future  Research 

Spatial  code  format  displays  should  be  considered  for  further  study 
and  application  in  tactical  c3  systems  as  well  as  other  areas  of  decision 
making  in  which  responses  and  the  Internal  representations  underlying 
those  responses  are  analog  in  form.  This  display  format  seems 
particularly  beneficial  when  decision  performance  is  limited  by  memory 
loss.  Additional  research  will  be  necessary  to  gain  a  better 
understanding  of  the  effects  of  time  stress  on  the  utility  of  the  spatial 
code  format.  Decreasing  both  the  problem  size  and  time  of  cue 
presentation  would  possibly  limit  the  effects  of  memory  loss  and  better 
define  the  effects  of  time  stress  on  integration  task  performance. 


The  consistent  effects  of  weighted  difference  on  confidence  ratings 
demonstrates  the  efficacy  of  this  model  for  describing  decision 
performance  In  the  present  paradigm.  A  simple  net  difference  model 
however  does  just  as  well,  if  not  better,  than  the  weighted  difference 
model  in  describing  the  amount  of  subjective  evidence  extracted  by  a 
subject  In  a  decision  trial.  Further  research  will  be  necessary  to 
clarify  the  relevance  of  these  two  descriptive  models.  It  Is  hoped  that 
an  experimental  design  which  varies  trial  evidence  In  accordance  with  both 
models,  but  In  an  orthogonal  manner,  would  isolate  the  effects  of  both 
models. 
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Finally,  additional  Insight  on  how  the  "as  If"  heuristic  operates  on 
different  sources  of  information  Is  required.  We  have  good  reason  to 
believe  that  the  operation  of  this  heuristic  on  cues  with  high 
dlagnostlclty  and  low  reliability  results  In  an  overestimation  bias  of  the 
cue  valence  or  worth.  This  "riskiness"  In  Interpreting  unreliable  data  Is 
possibly  the  result  of  a  cognitive  simplification.  The  operator  reduces 
the  load  Imposed  on  working  memory  by  Ignoring  reliability  or  placing  Its 
value  at  unity.  An  experimental  design  In  which  these  cues,  high  In 
dlagnostlclty  and  low  In  rellabllty,  are  pitted  against  cues  of  equal 
total  evidence,  but  of  moderate  dlagnostlclty  and  reliability,  might 
demonstrate  the  exact  nature  of  this  bias.  Further  knowledge  of  this 
effect  could  have  a  great  Impact  on  C3  system  performance  If  considered  In 
both  training  and  display  design. 


\ 
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Appendix  A 

Vi sual  Model  of  Information  Integration 
A  Simul ated  Tactical  Decision  Making  Task 

Imagine  that  you  are  the  commander  of  an  Army  unit.  You  have  just 
been  advised  that  the  threat  force  is  preparing  an  assault  against 
your  sector  (area)  of  responsibility.  Your  sector  is  very  large  and 
you  cannot  possibly  cover  (defend)  the  entire  area.  You  can, 
however,  successively  block  the  threat  assault  if  you  know  where  in 
your  area  he  will  attack  and  you  subsequently  concentrate  your  forces 
there,  i.e.,  will  the  enemy  attack  from  the  North  or  from  the  South? 


FRIENDLY 


YOUR  AREA 
OF 

RESPONSIBILITY 


THREAT 


You  will  be  presented  several  pieces  of  evidence  or  cues.  Your  task 
will  be  to  weigh  the  cues  presented  and  decide  which  alternative  Is 
most  likely,  I.e.,  will  the  threat  assault  in  the  North  or  South  of 
the  sector? 
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Appendix  B 

How  Much  Weight  to  Assign  Each  Cue 

A  cue  is  a  piece  of  evidence  which  supports  one  of  the  possible 
alternatives.  The  weight  or  valence  of  a  cue  is  a  function  of  the 
cue  diagnosticity  and  the  cue  reliability. 

Cue  diagnosticity  is  determined  by. the  relevance  of  the  information 
on  the  decision  at  hand.  Example:  Consider  a  jury  weighting  pieces 
of  evidence  (cues)  in  a  murder  trial. 

Cue  1:  A  character  witness  has  testified  that  the  defendant  has 
a  "bad  temper.'1  This  information  is  at  best  circumstantial  and 
not  very  relevant.  This  cue  would  have  a  very  low  diagnostic 
weight,  possibly  10  on  a  scale  of  0-100. 

Cue  2  :  The  subject  was  seen  fleeing  the  scene  of  the  murder. 
This  cue  is  very  relevant  and  would  surely  implicate  the 
defendant  in  the  murder.  This  cue  would  have  a  very  high 
diagnostic  weight,  possibly  90  on  a  scale  of  0-100. 

Cue  reliability  is  determined  by  the  credibility  of  the  source  of 
information.  K  source  can  be  a  person,  thing,  or  activity  from  which 
the  information  was  originally  obtained.  Example:  Consider  two 
different  cases  of  Cue  2  above. 

Case  1:  This  defendant  was  seen  fleeing  the  scene  of  the 
murder.  The  witness  was  a  policeman  responding  to  the  call  for 
help.  The  policeman  would  have  great  credibility  and  the 
reliability  of  this  cue  would  be  very  high,  possibly  90  on  a 
scale  of  0-100. 

Case  2:  The  subject  was  seen  fleeing  the  scene  of  the  murder. 
TFe"wTtness  is  a  known  felon  who  is  also  suspect  in  the  murder. 
This  witness  would  have  very  little  credibility.  The 
reliability  of  this  cue  would  be  low,  possibly  10  on  a  scale  of 
0-100. 

Therefore ,  cu£  wei ght  =  diagnosticity  x.  reliability.  What  wei ght 
would  you  assess  the  vTdeo  replay  of  John  Hi nkley's  assault  on 
President  Reagan  if  used  as  evidence  in  a  trial? 
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